Explanatory data analysis

Data cleaning

Let's reduce the dataset by dropping columns that won't be used during the analysis

Checking missing data before proceed to the next step

no missing value on these dataset

Changing the data types

create some new variables in our dataset.

Checking outliers and data distributions

No of bedrooms and Bathrooms

Based on the plot of QQ plot non normal distribution and from figure of price distribution there is outlier so we should use log transformation

it shows that log transformation would be better for the price variable: prefer approximately normal distribution.

Adding a logprice variable to the dataframe:

Map Visualization

Scatter plot price vs explanatory variables

Box plot

getting to know the location by their zip code

Zipcode: value_counts() tells us number of zipcodes = 70

checking the distribution of the zipcode data either even or uneven

We can also learn the mean price, lot size, etc. by zipcode

Modelling and Performance evaluation

creat train and test split data

The first step would be to create X and y